Skip to content

Fix server state adapter limit#74

Merged
Yunnglin merged 3 commits intomainfrom
server_state
Feb 26, 2026
Merged

Fix server state adapter limit#74
Yunnglin merged 3 commits intomainfrom
server_state

Conversation

@Yunnglin
Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

Copilot AI review requested due to automatic review settings February 26, 2026 08:16
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Yunnglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the server's internal state management, transitioning from a single, complex ServerState class to a more organized, manager-based architecture. This change enhances modularity and maintainability by delegating specific state concerns to dedicated components. Concurrently, the mechanism for limiting resource usage has been streamlined, replacing token-specific adapter limits with a server-global model limit, which simplifies configuration and enforcement.

Highlights

  • Refactored Server State Management: The monolithic ServerState class has been replaced by a modular design, introducing dedicated managers for sessions, models, sampling, futures, and configuration, enhancing maintainability and clarity.
  • Updated Adapter/Model Limit Logic: The per_token_adapter_limit configuration has been removed and replaced with a per_token_model_limit that is enforced server-globally, simplifying how model limits are applied across the system.
  • Configuration File Updates: Multiple server_config.yaml files were updated to reflect the new per_token_model_limit and remove the old per_token_adapter_limit parameter.
  • Improved Model Creation Error Handling: The create_model logic now ensures model_id registration occurs within the async adapter creation process and includes a conditional cleanup for model_id in case of exceptions.
  • Documentation Synchronization: English and Chinese usage guides were updated to align with the new configuration parameters and reflect the changes in model limit enforcement.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cookbook/client/tinker/megatron/server_config.yaml
    • Updated server configuration to include per_token_model_limit and removed per_token_adapter_limit.
  • cookbook/client/tinker/megatron/server_config_7b.yaml
    • Adjusted server configuration by adding per_token_model_limit, removing per_token_adapter_limit, and commenting out a sampler service.
  • cookbook/client/tinker/self_congnition.py
    • Removed an unused numpy import.
  • cookbook/client/tinker/transformer/server_config.yaml
    • Modified server configuration to incorporate per_token_model_limit and eliminate per_token_adapter_limit.
  • cookbook/client/twinkle/megatron/server_config.yaml
    • Applied server configuration changes, adding per_token_model_limit and removing per_token_adapter_limit.
  • cookbook/client/twinkle/transformer/server_config.yaml
    • Updated server configuration across multiple sections, introducing per_token_model_limit and removing per_token_adapter_limit.
  • docs/source_en/Usage Guide/Server and Client/Server.md
    • Synchronized English documentation to reflect the new per_token_model_limit and the removal of per_token_adapter_limit.
  • docs/source_zh/使用指引/服务端和客户端/服务端.md
    • Synchronized Chinese documentation to reflect the new per_token_model_limit and the removal of per_token_adapter_limit.
  • src/twinkle/server/tinker/model.py
    • Refined model creation logic to ensure model_id registration occurs within the async adapter creation and added a conditional check for model_id during exception cleanup.
  • src/twinkle/server/tinker/server.py
    • Removed an unnecessary dataclasses import.
  • src/twinkle/server/twinkle/model.py
    • Removed a redundant comment regarding adapter limit checks.
  • src/twinkle/server/utils/adapter_manager.py
    • Removed the per_token_adapter_limit parameter, _adapter_counts attribute, and the check_adapter_limit method, deprecating the old adapter limit enforcement.
  • src/twinkle/server/utils/state.py
    • Removed the entire legacy ServerState file, indicating a complete architectural shift.
  • src/twinkle/server/utils/state/init.py
    • Added a new __init__.py file to establish the new modular state management package structure.
  • src/twinkle/server/utils/state/base.py
    • Introduced a new BaseManager abstract class to provide common functionalities for state managers.
  • src/twinkle/server/utils/state/config_manager.py
    • Implemented a new ConfigManager for handling key-value configuration entries.
  • src/twinkle/server/utils/state/future_manager.py
    • Developed a new FutureManager to manage the status and lifecycle of asynchronous tasks.
  • src/twinkle/server/utils/state/model_manager.py
    • Created a new ModelManager responsible for registering and managing models, including the new per_token_model_limit enforcement.
  • src/twinkle/server/utils/state/models.py
    • Defined new Pydantic models (SessionRecord, ModelRecord, SamplingSessionRecord, FutureRecord) to provide structured data for state records.
  • src/twinkle/server/utils/state/sampling_manager.py
    • Developed a new SamplingSessionManager to oversee sampling sessions.
  • src/twinkle/server/utils/state/server_state.py
    • Implemented the new ServerState class, which now acts as a composite of the specialized managers, along with a ServerStateProxy for Ray actor integration.
  • src/twinkle/server/utils/state/session_manager.py
    • Introduced a new SessionManager to handle client session creation, updates, and cleanup.
Activity
  • The pull request addresses a bug fix related to server state adapter limits.
  • The author, Yunnglin, initiated a significant refactoring effort to improve the server's state management architecture.
  • The changes involve updating configuration files, core logic, and documentation to reflect the new modular design and model limit enforcement.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and beneficial refactoring of the server state management by breaking down the monolithic ServerState class into smaller, more focused manager classes. This greatly improves modularity and maintainability. The adapter/model limit has been moved from a per-model configuration to a global server setting, enforced within the new ModelManager, which aligns with the PR's objective. While the refactoring is well-executed, I've identified a few areas for improvement: a potentially incomplete implementation of the new limit for the Twinkle protocol, some code duplication in the new state manager, and a commented-out service in a configuration file that should be cleaned up.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the server state management system to centralize per-token model limits at the ServerState level rather than enforcing them at the adapter level. The primary goal is to fix a bug where adapter limits were checked too late in the model creation flow, allowing users to exceed the limit under certain conditions.

Changes:

  • Refactored single-file state.py into a modular package structure (state/) with separate managers for sessions, models, sampling sessions, futures, and configuration
  • Moved per-token limit enforcement from AdapterManager to ModelManager within ServerState, checking the limit during model registration rather than adapter creation
  • Updated model creation flow in tinker/model.py to register models inside the _create_adapter coroutine, ensuring atomicity and proper error handling
  • Updated YAML configuration files to specify per_token_model_limit in server_config instead of per_token_adapter_limit in adapter_config

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/twinkle/server/utils/state.py Deleted monolithic state file
src/twinkle/server/utils/state/base.py New abstract base manager class with common CRUD and cleanup operations
src/twinkle/server/utils/state/models.py Pydantic models for session, model, sampling session, and future records
src/twinkle/server/utils/state/session_manager.py Session lifecycle management with heartbeat tracking
src/twinkle/server/utils/state/model_manager.py Model registration with per-token limit enforcement
src/twinkle/server/utils/state/sampling_manager.py Sampling session lifecycle management
src/twinkle/server/utils/state/future_manager.py Async task future/request status tracking
src/twinkle/server/utils/state/config_manager.py Key-value configuration storage
src/twinkle/server/utils/state/server_state.py Unified ServerState class composing all managers, plus Ray proxy and factory
src/twinkle/server/utils/state/init.py Package exports
src/twinkle/server/utils/adapter_manager.py Removed per_token_adapter_limit parameter and check_adapter_limit method
src/twinkle/server/twinkle/model.py Updated comment to reflect that limit check no longer happens in register_adapter
src/twinkle/server/tinker/server.py Removed unused dataclasses import
src/twinkle/server/tinker/model.py Moved model registration inside _create_adapter coroutine, added null check before cleanup, removed model_id from schedule_task call, updated comment
cookbook/client/twinkle/transformer/server_config.yaml Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit
cookbook/client/twinkle/megatron/server_config.yaml Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit
cookbook/client/tinker/transformer/server_config.yaml Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit
cookbook/client/tinker/self_congnition.py Removed unused numpy import
cookbook/client/tinker/megatron/server_config_7b.yaml Added server_config.per_token_model_limit: 1, removed adapter_config.per_token_adapter_limit, commented out sampler service
cookbook/client/tinker/megatron/server_config.yaml Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit from model and sampler

@Yunnglin Yunnglin merged commit d4c5db5 into main Feb 26, 2026
6 of 8 checks passed
@Yunnglin Yunnglin deleted the server_state branch March 1, 2026 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants